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Abstract 

Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This 
technical report describes a detailed set of mathematical performance models for describing the 
execution of a MapReduce job on Hadoop. The models describe dataflow and cost information 
at the fine granularity of phases within the map and reduce tasks of a job execution. The 
models can be used to estimate the performance of MapReduce jobs as well as to find the 
optimal configuration settings to use when running the jobs. 

The execution of a MapReduce job is broken down into map tasks and reduce tasks. Subsequently, 
map task execution is divided into the phases: Read (reading map inputs), Map (map function pro- 
cessing), Collect (serializing to buffer and partitioning), Spill (sorting, combining, compressing, and 
writing map outputs to local disk), and Merge (merging sorted spill files). Reduce task execution 
is divided into the phases: Shuffle (transferring map outputs to reduce tasks, with decompression 
if needed), Merge (merging sorted map outputs), Reduce (reduce function processing), and Write 
(writing reduce outputs to the distributed file-system) . Each phase represents an important part of 
the job's overall execution in Hadoop. We have developed performance models for each task phase, 
which are then combined to form the overall Map-Reduce Job model. 

1 Model Parameters 

The performance models rely on a set of parameters to estimate the cost of a Map- Reduce job. We 
separate the parameters into three categories: 



1. Hadoop Parameters: A set of Hadoop-defined configuration parameters that effect the execu- 
tion of a job 

2. Profile Statistics: A set of statistics specifying properties of the input data and the user- 
defined functions (Map, Reduce, Combine) 

3. Profile Cost Factors: A set of parameters that define the I/O, CPU, and network cost of a 
job execution 
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Variable 


Hadoop Parameter 


Default Value 


Effect 


pNumNodes 


Number of Nodes 




System 


pTaskMem 


mapred.child.java.opts 


-Xmx200m 


System 


pMaxMapsPerNode 


mapred.tasktracker. map. tasks. max 


2 


System 


pMaxRedPerNode 


mapred.tasktracker. reduce. tasks. max 


2 


System 


pNumMappers 


mapred. map. tasks 




Job 


pSortMB 


io.sort.mb 


fOO MB 


Job 


pSpillPerc 


io. sort. spill. percent 


0.8 


Job 


pSortRecPerc 


io. sort. record. percent 


0.05 


Job 


pSortFactor 


io. sort. factor 


fO 


Job 


pNumSpillsForComb 


min. num. spills, for. combine 


3 


Job 


pNumReducers 


mapred. reduce. tasks 




Job 


plnMemMergeThr 


mapred. inmem. merge. threshold 


fOOO 


Job 


pShufflelnBufPerc 


mapred.job. shuffle, input, buffer, percent 


0.7 


Job 


pShuffleMergePerc 


mapred. j ob . shuffle .merge .percent 


0.66 


Job 


pReducerlnBufPerc 


mapred.job. reduce. input. buffer. percent 





Job 


pUseCombine 


mapred. combine. class or mapreduce. combine. class 


null 


Job 


plsIntermCompressed 


mapred. compress . map . output 


false 


Job 


plsOutCompressed 


mapred. output. compress 


false 


Job 


pReduceSlowstart 


mapred. reduce. slowstart. completed. maps 


0.05 


Job 


plsInCompressed 


Whether the input is compressed or not 




Input 


pSplitSize 


The size of the input split 




Input 



Table 1: Variables for Hadoop Parameters 

Table [1] defines the variables that are associated with Hadoop parameters. 

Table [2] defines the necessary profile statistics specific to a job and the data it is processing. 



Variable 


Description 


slnputPair Width 

sMapSizeSel 

sMapPairsSel 

sReduceSizeSel 

sReducePairsSel 

sCombincSizeSel 

sCombinePairsSel 


The average width of the input K-V pairs 

The selectivity of the mapper in terms of size 

The selectivity of the mapper in terms of number of K-V pairs 

The selectivity of the reducer in terms of size 

The selectivity of the reducer in terms of number of K-V pairs 

The selectivity of the combine function in terms of size 

The selectivity of the combine function in number of K-V pairs 


slnput CompressRatio 

slntermCompressRatio 

sOutCompressRatio 


The ratio of compression for the input data 

The ratio of compression for the intermediate map output 

The ratio of compression for the final output of the job 



Table 2: Variables for Profile Statistics 



Table [3] defines system specific parameters needed for calculating I/O, CPU, and network costs. 
The IO costs and CPU costs related to compression are defined in terms of time per byte. The 
rest CPU costs are defined in terms of time per K-V pair. The network cost is defined in terms of 
transferring time per byte. 
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Variable 


Description 


cHdfsRcadCost 
cHdfsWriteCost 
cLocallOCost 
cNctworkCost 


The cost for reading from HDFS 

The cost for writing to HDFS 

The cost for performing I/O from the local disk 

The network transferring cost 


cMapCPUCost 

cReduceCPUCost 

cCombineCPUCost 


The CPU cost for executing the map function 
The CPU cost for executing the reduce function 
The CPU cost for executing the combine function 


cPartitionCPUCost 
cSerdeCPUCost 
cSortCPUCost 
cMergeCPUCost 


The CPU cost for partitioning 
The CPU cost for serialization 
ihe CPU cost for sorting on keys 
The CPU cost for merging 


clnUncomprCPUCost 
clntcrmUncomprCPUCost 
clntermComprCPUCost 
cOutComprCPUCost 


The CPU cost for uncompressing the input data 
The CPU cost for uncompressing the intermediate data 
The CPU cost for compressing the intermediate data 
The CPU cost for compressing the output data 



Table 3: Variables for Profile Cost Factors 



Let's define the identity function / as: 

_ , . I 1 if x exists or equals true . . 

I(x) = < (1) 
I otherwise 

Initializations: In an effort present concise formulas and avoid the use of conditionals as much as 
possible, we make the following initializations: 



If (pUseCombine == FALSE) 
sCombineSizeSel = 1 
sCombinePairsSel = 1 
cCombineCPUCost = 

If (pi sin Compressed == FALSE) 
slnputCompressRatio = 1 
clnUncomprCPUCost = 

If (plsIntermCompressed == FALSE) 
slntermCompressRatio = 1 
clntermUncomprCPUCost = 
clntermComprCPUCost = 

If (plsOutCompressed == FALSE) 
sOutCompressRatio = 1 
cOutComprCPUCost = 
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2 Performance Models for the Map Task Phases 

The Map Task execution is divided into five phases: 

1. Read: Reading the input split and creating the key- value pairs. 

2. Map: Executing the user-provided map function. 

3. Collect: Collecting the map output into a buffer and partitioning. 

4. Spill: Sorting, using the combiner if any, performing compression if asked, and finally spilling 
to disk, creating file spills. 

5. Merge: Merging the file spills into a single map output file. Merging might be performed in 
multiple rounds. 

2.1 Modeling the Read and Map Phases 

During this phase, the input split is read, uncompressed if necessary, the key- value pairs are created, 
and passed an input to the user-defined map function. 

pSplitSize 

inputMapSize = — — (2) 

slnputCompressRaUo 

inputMapSize 

inputMapPairs = — (3) 

slnputPair Width 

The costs of this phase are: 

IOCostftead = pSplitSize x cHdfsReadCost 

CPUCostRead = pSplitSize x clnUncomprCPUCost 

+ inputMapPairs x cMapCPUCost (4) 

If the MR job consists only of mappers (i.e. pNumReducers = 0), then the spilling and merging 
phases will not be executed and the map output will be written directly to HDFS. 

outMapSize = inputMapSize x sMapSizeSel (5) 

IOCostMapWrite = outMapSize x sOutCompressRatio x cHdfsWriteCost (6) 
CPUCostMapWrite = outMapSize x cOutComprCPUCost (7) 
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2.2 Modeling the Collect and Spill Phases 

The map function generates output key- value (K-V) pairs that are placed in the map-side memory 
buffer. The formulas regarding the map output are: 

outMapSize = inputMapSize x sMapSizeSel (8) 

outMapPairs = inputMapPairs x sMapPairsSel (9) 

outMapSize , . 

outPairWidth = (10) 

outMapPairs 

The memory buffer is split into two parts: the serialization part that stores the key-value pairs, 
and the accounting part that stores metadata per pair. When either of these two parts fills up 
(based on the threshold value pSpillPerc), the pairs are partitioned, sorted, and spilled to disk. 
The maximum number of pairs for the serialization buffer is: 



maxSerPairs = 



pSortMB x 2 20 x (1 - pSortRecPerc) x pSpillPerc 



outPairWidth 

The maximum number of pairs for the accounting buffer is: 

, pSortMB x 2 20 x pSortRecPerc x pSpillPerc 

max Acer airs = 

16 

Hence, the number of pairs and size of the buffer before a spill will be: 



(11) 



(12) 



spillBuffer Pairs = Min{ maxSerPairs, maxAccPairs , outMapPairs } (13) 
spillBufferSize = spillBuffer Pairs x outPairWidth (14) 
The overall number of spills will be: 
outMapPairs 



numSpills = 



spillBuffer Pairs 



(15) 



The number of pairs and size of each spill depends on the width of each K-V pair, the use of the 
combine function, and the use of intermediate data compression. Note that slntermCompressRatio 
is set to 1 by default, if intermediate compression is disabled. Note that sCombinePairsSel and 
sCombinePairsSel are set to 1 by default, if no combine function is used. 

spillFileP airs = spillBufferPairs x sCombinePairsSel (16) 
spillFileSize = spillBufferSize x sCombineSizeSel x slntermCompressRatio (17) 
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The costs of this phase are: 

IOCostspUi = numSpills x spillFileSize x cLocallOCost 



(18) 



CPUCostspM = numSpills x 



[ spillBuffer Pairs x cPartitionCPUCost 



+ spillBuffer Pairs x cSerdeCPUCost 




pNumReducers 
+ spillBuffer Pairs x cCombineCPUCost 

+ spillBuffer Size x sCombineSizeSel x clntermComprCPUCost ] 



(19) 



2.3 Modeling the Merge Phase 

The goal of the merge phase is to merge all the spill files into a single output file, which is written 
to local disk. The merge phase will occur only if more that one spill file is created. Multiple merge 
passes might occur, depending on the pSortFactor parameter. We define a merge pass to be the 
merging of at most pSortFactor spill files. We define a merge round to be one or more merge passes 
that merge only spills produced by the spill phase or a previous merge round. For example, suppose 
numSpills = 30 and pSortFactor = 10. Then, 3 merge passes will be performed to create 3 new 
files. This is the first merge round. Then, the 3 new files will be merged together forming the 2nd 
and final merge round. 

The final merge pass is unique in the sense that if the number of spills to be merged is greater than 
or equal to pNumSpillsForComb, the combiner will be used again. Hence, we treat the intermediate 
merge rounds and the final merge separately. For the intermediate merge passes, we calculate how 
many times (on average) a single spill will be read. 

Note that the remaining section assumes numSpils < pSortFactor 2 . In the opposite case, we must 
use a simulation-based approach in order to calculate the number of spills merged during the in- 
termediate merge rounds as well as the total number of merge passes. 

The first merge pass is also unique because Hadoop will calculate the optimal number of spill files 
to merge so that all other merge passes will merge exactly pSortFactor files. 

Since the Reduce task also contains a similar Merge Phase, we define the following three methods 
to reuse later: 



(N 



if N < F 



calcNumSpillsFirstPass(N , F) 



= < 



F , if (N - 1) MOD (F - 1) = 

k (N - 1) MOD (F - 1) + 1 , otherwise 



(20) 




(21) 
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calcNumSpillsFinalMerge(N , F) 



N , if N < F 

1+ + (N-S) , ifiV<F 2 

, where P = calcNumSpillsFirstPass(N , F) 
, where S = calcNumSpillsIntermMerge(N , F) (22) 

The number of spills read during the first merge pass is: 

numSpillsFirstPass = calcNumSpillsFirstPass(numSpills, pSortFactor) (23) 
The number of spills read during the intermediate merging is: 

numSpillsIntermMerge = calcNumSpillsIntermMerge(numSpills, pSortFactor) (24) 
The total number of merge passes will be: 



numMergeP asses = < 



, if numSpills = 1 

1 , if numSpills < pSortFactor (25) 



2 + 



numSpills— numSpillsFirstPass 
pSortFactor 



, if numSpills < pSortFactor 



2 



The number of spill files for the final merge round is (first pass + intermediate passes + remaining 
file spills): 

numSpillsFinalMerge = calcNumSpillsFinalMerge(numSpills, pSortFactor) (26) 
The total number of records spilled is: 

numRecSpilled = spillFilePairs x [numSpills + numSpillsIntermMerge + numSpills x sCombinePairsSel] 

(27) 

The final map output size and number of K-V pairs are: 

useComblnMerge = (numSpills > 1) AND (pUseCombine) 

AND (numSpillsFinalMerge > pNumSpillsForComb) (28) 

intermDataSize = numSpills x spillFileSize 

\ sCombineSizeSel if useComblnMerge 
) 1 otherwise 

intermDataPairs = numSpills x spillFilePairs 

I sCombinePairsSel if useComblnMerge 
I 1 otherwise 
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The costs of this phase are: 

IOCostMerge = 

2 x numSpillsIntermMerge x spillFileSize x cLocallOCost / j interm merges 

+numSpills x spillFileSize x cLocallOCost j / read final merge 

+intermDataSize x cLocallOCost j j write final merge (31) 



CPUCostMerge = 

numSpillsIntermMerge x 

[ spillFileSize x clntermUncomprCPUCost 

+spillFilePairs x cMergeCPUCost 

spillFileSize ^t^tt^ -\ 

H — x clntermComprCPUCost \ 

slntermCompressRatio 

+numSpillsx 

[ spillFileSize x clntermUncomprCPUCost 

+spillFilePairs x cMergeCPUCost 

+spillFilePairs x cCombineCPU Cost ] 

intermDataSize „ „ „ , . 

H — —— x clntermComprCPUCost (32) 

slnterm C ompressnatio 

2.4 Modeling the Overall Map Task 

The above models correspond to the execution of a single map task. The overall costs for a single 
map task are: 

J IOCost Read + LOCostMapWrite if pNumReducers = 

IOCost Ma p = < (33) 
I IOCostRead + LOCostspui + IOCostMerge if pNumReducers > 

j CPU Cost Read + CPUCost Map Write if pNumReducers = 

CPUCost Ma p = s (34) 
I CPU Cost Rea d + CPU Costspiii + CPU Cost Merge if pNumReducers > 



3 Performance Models for the Reduce Task Phases 

The Reduce Task is divided into four phases: 

1. Shuffle: Copying the map output from the mapper nodes to a reducer's node and decom- 
pressing, if needed. Partial merging may also occur during this phase. 

2. Merge: Merging the sorted fragments from the different mappers to form the input to the 
reduce function. 

3. Reduce: Executing the user-provided reduce function. 

4. Write: Writing the (compressed) output to HDFS. 
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3.1 Modeling the Shuffle Phase 

The following discussion refers to the execution of a single reduce task. In the Shuffle phase, the 
framework fetches the relevant map output partition from each mapper (called segment) and copies 
it to the reducer's node. If the map output is compressed, it will be uncompressed. For each map 
segment that reaches the reduce side we have: 

intermDataSize 

segments omprbize = — — — — (35) 

pNumReducers 

TT segmentComprSize . . 

segmentUncomprSize = — — (36) 

slntermCompressRatio 
intermDataPairs 

segmentPairs = — — — (37) 

pNumReducers 

where intermDataSize and intermDataPairs are the size and number of pairs produced as inter- 
mediate output by a single mapper (see Section 12. 3|) . 

The data fetched to a single reducer will be: 

totalShuffleSize = pNumMappers * segmentComprSize (38) 
totalShufflePairs = pNumMappers * segmentPairs (39) 
As the data is copied to the reducer, they are placed in the shuffle buffer in memory with size: 

shuffleBufferSize = pShufflelnBufPerc x pTaskMem (40) 

When the in-memory buffer reaches a threshold size or the number of segments becomes greater 
than the plnMemMergeThr , the segments are merged and spilled to disk creating a new local file 
(called shuffteFile) . The merge size threshold is: 

mergeSizeThr = pShuffleMergePerc x shuffleBufferSize (41) 

However, when the segment size is greater that 25% of the shuffleBufferSize, the segment will go 
straight to disk instead of passing through memory (hence, no in-memory merging will occur). 

Case 1: segmentUncomprSize < 0.25 x shuffleBufferSize 

numSeglnShuffleFile = — (42) 

segmentUncomprSize 

If {\numSegInShuffleFile~\ x segmentUncomprSize < shuffleBufferSize) 

numSeglnShuffleFile = \ numSeglnShuffleFile^ 
else 

numSeglnShuffleFile = [numSegInShuffleFile\ 
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If (numSeglnShuffleFile > plnMemMergeThr) 

numSeglnShuffleFile = plnMemMergeThr (43) 

A shuffle file is the merging on numSeglnShuffleFile segments. If a combine function is specified, 
then it is applied during this merging. Note that if numSeglnShuffleFile > numMappers, then 
merging will not happen. 

shuffleFileSize = 

numSeglnShuffleFile x segmentComprSize x sCombineSizeSel (44) 
shuffleFilePairs = 

numSeglnShuffleFile x segmentPairs x sCombinePairsSel (45) 



numShuffleFiles 



pNumMappers 



numSeglnShuffleFile 
At the end of the merging, some segments might remain in memory. 



(46) 



numSegmentsInMem = pNumMappers MOD numSeglnShuffleFile (47) 
Case 2: segmentUncomprSize > 0.25 x shuffleBufferSize 

numSeglnShuffleFile = 1 (48) 

shuffleFileSize = segmentComprSize (49) 

shuffleFilePairs = segmentPairs (50) 

numShuffleFiles = pNumMappers (51) 

numSegmentsInMem = (52) 

Either case will create a set of shuffle files on disk. When the number of shuffle files on disk increases 
above a certain threshold (2 x pSortFactor — 1), a new merge thread is triggered and pSortFactor 
shuffle files are merged into a new larger sorted one. The Combiner is not used during this disk 
merging. The total number of such merges are: 

[0 , if numShuf fleFiles < 2 x pSortFactor — 1 

numShuffleMergeS = < num ShuffleFiles-2xpSortFactor+l , -, . , ■ ( 53 ) 
[ [_ pSortFactor J ' OTnerW1Se 

At the end of the Shuffle phase, a set of merged and unmerged shuffle files will exist on disk. 

numM erg Shuf Files = numShuffleMerges (54) 
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mergShufFileSize = pSortFactor x shuffleFileSize 
mergShufFilePairs = pSortFactor x shuffleFilePairs 



(55) 
(56) 



numUnmergShuf Files = numShuffleFiles — pSortFactor x numShuffleMerges (57) 

unmergShufFileSize = shuffleFileSize (58) 
unmergShufFilePairs = shuffleFilePairs (59) 

The cost of the Shuffling phase is: 

IOCostghuffie = numShuffleFiles x shuffleFileSize x cLocallOCost 

+numMergShuj Files x mergShufFileSize x 2 x cLocallOCost (60) 

CPU Cost shuffle = 

[ totalShuffleSize x clntermUncomprCPUCost 

+ numShuffleFiles x shuffleFilePairs x cMergeCPUCost 
+ numShuffleFiles x shuffleFilePairs x cCombineCPUCost 

shuffleFileSize 

+ numShuffleFiles x — — x clntermComprCPUCost 

slntermCompressRatio 

] x I (segmentUncomprSize < 0.25 x shuffleBufferSize) 

+numM erg Shuf Files x mergShufFileSize x clntermUncomprCPUCost 

+numM erg Shuf Files x mergShufFilePairs x cMergeCPUCost 

%r ni mergShufFileSize ^^tt^, 

+ numMergbhujl< lies x — x clntermComprCPUCost (61) 

slntermCompressRatio 

3.2 Modeling the Merge Phase 

After all the map outputs have been successful copied in memory and/or on disk, the sort- 
ing/merging phase begins. This phase will merge all data into a single stream that is fed to 
the reducer. Similar to the Map Merge phase (see Section [2.3p . this phase may occur it multiple 
rounds, but during the final merging, instead of creating a single output file, it will send the data 
directly to the reducer. 



The shuffle phase produced a set of merged and unmerged shuffle files on disk, and perhaps a set 
of segments in memory. The merging is done in three steps. 

Step 1: Some segments might be evicted from memory and merged into a single shuffle file to satisfy 
the memory constraint enforced by pReducerlnBufPerc. (This parameter specifies the amount of 
memory allowed to be occupied by segments before the reducer begins.) 
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maxSegmentBuffer = pReducerlnBufPerc x pTaskMem 



(62) 



currSegmentBuffer = numSegmentsInMem x segmentUncomprSize 



(63) 



If (currSegmentBuffer > maxSegmentBuffer) 



numSegmentsEvicted 



currSegmentBuffer — maxSegmentBuffer 



segmentUncomprSize 



else 



numSegmentsEvicted = 



(64) 



numSegmentsRemainMem = numSegmentsInMem — numSegmentsEvicted 



(65) 



The above merging will only occur if the number of existing shuffle files on disk are less than the 
pSortFactor. If not, then the shuffle files would have to be merged, and the in-memory segments 
that are supposed to be evicted are left to be merge with the shuffle files on disk. 



If (numFilesOnDisk < pSortFactor) 
numFilesFromMem = 1 

filesFromMemSize = numSegmentsEvicted x segmentComprSize 
filesFromMemF 'airs = numSegmentsEvicted x segmentPairs 
step 1 Merging Size = filesFromMemSize 
steplMergingPairs = filesFromMemF 'airs 
else 

numFilesFromMem = numSegmentsEvicted 
filesFromMemSize = segmentComprSize 
filesFromMemF airs = segmentPairs 
step 1 Merging Size = 

steplMergingPairs = (67) 

filesToMergeStep2 = numFilesOnDisk + numFilesFromMem (68) 

Step 2: Any files on disk will go through a merging phase in multiple rounds (similar to the process 

in Section [231 This step will happen only if numFilesOnDisk > (which implies filesToMergeStep2 > 0). 

The number of intermediate reads (and writes) are: 

intermMergeReads = calcNumSpillsIntermMerge(filesToMergeStep2 , pSortFactor) (69) 



numFilesOnDisk = numM erg Shuf Files + numUnmergShuf Files 



(66) 
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The main difference from Section [2.31 is that the merged files have different sizes. We account for 
this by attributing merging costs proportionally. 



intermMergeReads 

step2Mergingbize 



files ToMergeStep2 
[ numM erg Shu] Tiles x mergShufFileSize 
+numUnmergShuj "Files x unmergShufFileSize 

+numFilesFromMem x filesFromMemSize] (70) 

intermMergeReads 

step2MergingPairs = — — x 

filesToMergebtep2 

[ numM erg Shuf Files x mergShufFilePairs 

+ numUnmergShuf Files x unmerg Shuf FileP airs 

+numFilesFromMem x JilesFromMemPairs] (71) 

filesRemainFromStep2 = calcNumSpillsFinalMerge(filesToMergeStep2 , pSortFactor) (72) 
Step 3: All files on disk and in memory will go through merging. 

filesToMergeStepS = filesRemainFromStep2 + numSegmentsRemainMem (73) 
The process is identical to step 2 above. 

intermMergeReads = calcNumSpillsIntermMerge(filesToMergeStep3 , pSortFactor) (74) 

_. intermMergeReads 

stepSMergmgbize = — — x totalbhuffiebize (75) 

filesToMergeStepS 

intermMergeReads 

step 3 Merging Pairs = — — x totalbhujjiePairs (76) 

jices l oiviergeotepo 

filesRemainFromStep3 = calcNumSpillsFinalMerge(filesToMergeStep3 , pSortFactor) (77) 
totalMergingSize = step 1 Merging Size + step2MergingSize + stepSMergingSize (78) 

The cost of the Sorting phase is: 

IOCostsort = totalMergingSize x cLocallOCost (79) 

CPU Cost Sort = 

totalMergingSize x cMergeCPUCost 

totalMergingSize , ^nrr^-, 

— x clntermComprCP u Cost 

slntermCompressRatio 

[step2MergingSize + step 3 Merging Size] x clntermUnomprCPUCost (80) 
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3.3 Modeling the Reduce and Write Phases 

Finally, the user-provided reduce function will be executed and the output will be written to HDFS. 



. numShuffleFiles x shuffleFileSize 

mReducebize = — 

slntermCompressRatio 

numSegmentsInMem x segmentComprSize 
slntermCompressRatio 

inReducePairs = numShuffleFiles x shuffleFilePairs 

+ numSegmentsInMem x segmentComprPairs (82) 



outReduceSize = inReduceSize x sReduceSizeSel (83) 
outReduceP airs = inReducePairs x sReducePairsSel (84) 

The input to the reduce function resides in memory and/or in the shuffle files produced by the 
Shuffling and Sorting phases. 

inRedSizeDiskSize = numM erg Shuf Files x mergShufFileSize 

+numUnmergShuf Files x unmergShufFileSize 

+numFilesFromMem x filesFromMemSize (85) 
The cost of the Write phase is: 

10 Cost \y r it e = inRedSizeDiskSize x cLocallOCost 

+ outReduceSize x sOutCompressRatio x cHdfsWriteCost (86) 

CPUCostwrite = inReducePairs x cReduceCPXJ Cost 

+ inRedSizeDiskSize x clntermUncompCPUCost 

+ outReduceSize x cOutCompr CPU Cost (87) 
3.4 Modeling the Overall Reduce Task 

The above models correspond to the execution of a single reduce task. The overall costs for a single 
reduce task, excluding network transfers, are: 



IOCost Reduce = IOCostshuffie + IOCostsort + IOCost Write 

CPUC0St Reduce = CPUCostshuffle + CPUCostsort + CPUC0St Wn te (89) 
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4 Performance Models for the Network Transfer 



During the shuffle phase, all the data produced by the map tasks is copied over to the nodes running 
the reduce tasks (except for the data that is local). The overall data transferred in the network is: 

„ pNumNodes — 1 
netTransferSize = finalOutMapSize x pNumMappers x — — Jj~^ (90) 

where finalOutMapSize is the size of data produced by a single map tasks. 
The overall cost for transferring data over the network is: 

NETCostjob = netTransferSize x networkCost (91) 

5 Performance Models for the Map-Reduce Job 

The MapReduce job consists of several map and reduce tasks executing in parallel and in waves. 
There are two primary ways to estimating the total costs of the job: (i) simulate the task execution 
using a Task Scheduler Simulator, and (ii) calculate the expected total costs analytically. 

Simulation involves scheduling and simulating the execution of individual tasks on a virtual 
Cluster. The cost for each task is calculated using the proposed performance models. 

The second approach involves using the following analytical costs: 

IOCost A ii M = P NumMa PP ers x IOCostMap , Q2 . 
aps pNumNodes x pMaxMapsPerNode 



CPUCost AllMaps = P NumMa PP ers x CPUCost Map 

pNumNodes x pMaxMapsPerNode 



IOCostAUR d - P NumRedUCeTS X iOCostReduce 

e ucers pNumNodes x pMaxRedPerNode 



CPU Cost aur d - P NumReducers x CPUCost Reduce , Q5 s 
e ucers pNumNodes x pMaxRedPerNode 

The overall job cost is simply the sum of the costs from all the map and the reduce tasks. 

IOCostj b - \ IOCost AUMa P s if pNumReducers = ^ 

\lOCost A iiMaps + IOCost A iiReducers ^ pNumReducers > 
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CPUCost Job = { CPUCost ^a P s if pNumRedueers = (g?) 

y CPU Cost A iiMaps + CPU Cost aii Reducers if pNumRedueers > 

With appropriate system parameters that allow for equal comparisons among the I/O, CPU, and 
network costs, the overall cost is: 

Cost Job = IOCostjob + CPUCost Job + NETCostj b (98) 
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